An Author Verification Approach Based on Differential Features: Notebook for PAN at CLEF 2015
نویسندگان
چکیده
We describe the approach that we submitted to the 2015 PAN competition [7] for the author identification task. The task consists in determining if an unknown document was authored by the same author of a set of documents with the same author. We propose a machine learning approach based on a number of different features that characterize documents from widely different points of view. We construct non-overlapping groups of homogeneous features, use a random forest regressor for each features group, and combine the output of all regressors by their arithmetic mean. We train a different regressor for each language. Our approach achieved the first position in the final rank for the Spanish language.
منابع مشابه
Authorship Verification: An Approach based on Random Forest: Notebook for PAN at CLEF 2015
Authorship attribution, being an important problem in many areas including information retrieval, computational linguistics, law and journalism etc., has been identified as a subject of increasingly research interest in the recent years. In case of Author Identification task in PAN at CLEF 2015, the main focus was given on cross-genre and cross-topic author verification tasks. We have used seve...
متن کاملStyle-based Distance Features for Author Verification Notebook for PAN at CLEF 2013
In this paper we present the approach we took in our participation to the PAN 2013 Author Profiling task. It is an adaptation of our system submitted for author identification, assuming that a profile category (authors belonging to the same gender and age group categories) can be analyzed in the same way as an author’s style.
متن کاملUniNE at CLEF 2015 Author Identification: Notebook for PAN at CLEF 2015
This paper describes and evaluates an unsupervised authorship verification model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Greek, and Spanish) with their genre and topic differ significantly. As features, we suggest using the k most frequent terms of the disputed text (isolated words and punctuation symbols with ...
متن کاملAuthor Verification Using Syntactic N-grams: Notebook for PAN at CLEF 2015
This paper describes our approach to tackle the Author Verification task at PAN 2015. Our method builds a representation of an author’s style by using the information contained in dependency trees. This information is represented as syntactic n-grams and used to conform a vector space. Using unsupervised machine learning approach, each instance is associated to the correponding author using the...
متن کاملKnow-Center at PAN 2015 Author Identification: Notebook for PAN at CLEF 2015
Our system for the PAN 2015 authorship verification challenge is based upon a two step pre-processing pipeline. In the first step we extract different features that observe stylometric properties, grammatical characteristics and pure statistical features. In the second step of our pre-processing we merge all those features into a single meta feature space. We train an SVM classifier on the gene...
متن کامل